Baseline adaptive randomization
Steps to conduct baseline adaptive randomization
- Assume there are $k$ baseline covariates chosen as the stratification factors, each of which would have several levels
- At the beginning of the randomization, since there isn't any patients yet, simple randomization will be used for the first $m$ patients ($m$ = No. of treatment groups)
- After Step 2, for every new incoming patient, calculate the measure of imbalance for assigning that patient to every possible treatment
- Generate the probability of biased coin for treatment assignment based on the measure of imbalance from Step 3: group that leads to larger imbalance score should receive smaller probability of having a patient being assigned
Measure of imbalance
i.
$$ B(t) =\sum_{f=1}^F w_f*\mathcal{F}(n^1_{f}, n^2_{f}, n^{t}_{f}+1, \cdots, n^K_{f} ), t=1, 2, \cdots, K $$- $B(t)$ standards for the imbalance score if the new subject is randomized to arm $t$
- $f$ is the indicator for randomization factor; $t$ is the indicator for treatment arm
- $\mathcal{F}$ the function to quantify imbalance can be
variance,range, etc.
ii. max $\chi$-square statistic among all the stratification factors
Sample code
library(arsenal)
# N: total sample size for randomization
# treatment groups: LETTERS[1:K]
# baseline stratification factors (and levels within each factor)
# function to generate a sample for randomization
gen_sample = function(n = 200, prob_age = c(0.2, 0.3, 0.5), prob_sex = 0.5, prob_occ = 0.6, prob_site = 0.6){
SEX = as.factor(rbinom(n, 1, prob = prob_sex) + 1)
OCCLUTION = as.factor(rbinom(n, 1, prob = prob_occ) + 1)
AGE = as.factor(sample(length(prob_age), n, prob_age, replace = TRUE))
SITE = as.factor(rbinom(n, 1, prob = prob_site) + 1)
data.frame(AGE = AGE, SEX = SEX, OCCLUTION = OCCLUTION, SITE = SITE)
}
# function to assign group for new patients based on biased coin probability calculated from all assigned pats
assign = function(trt_grp, prob){
return(sample(trt_grp, 1, replace = FALSE, prob))
}
# Baseline Adaptive Randomization using biased coin method
BAR = function(df, n_trt = 2, prob_bc = 2/3){
# df: a sample to be randomized
# n_trt: number of treatment arms
# prob_bc: biased coin probability when there is imbalance between two treatment arms
trt_grp = LETTERS[1:n_trt]
df$trt = NA
i = 1
max_x_level = max(sapply(df, function(x) length(levels(x)))) # find the maximum length of levels in all x variables
while(i <= 1){
new_trt = assign(trt_grp, prob = c(0.5, 0.5))
df$trt[i] = new_trt
i = i + 1
}
while(i <= nrow(df)){
# print(i)
chisq_stat = range = rep(Inf, length(trt_grp))
for (j in seq_along(trt_grp))
{
df$trt[i] = trt_grp[j]
# chisq_stat[j] = max(sapply(df[, names(df) != 'trt'], function(x) chisq.test(table(x, df$trt))$statistic))
range[j] = sum(sapply(df[, names(df) != 'trt'], function(x) sum(apply(table(x, df$trt), 1, function(y) max(y) - min(y))))) # measure the absolute difference in terms of cell frequencies between two arms for all potential treatment assignment (a.k.a range method)
# print(range)
}
# prob_assign = ifelse(chisq_stat[1] == chisq_stat[2], 0.5, ifelse(chisq_stat[1] < chisq_stat[2], prob_bc, 1-prob_bc))
prob_assign = ifelse(range[1] == range[2], 0.5, ifelse(range[1] < range[2], prob_bc, 1 - prob_bc)) # assign higher probability to assignment leads to smaller imbalance
df$trt[i] = assign(trt_grp, c(prob_assign, 1-prob_assign)) # do random draw of the treatment for the patient to be randomized
i = i + 1
}
return(df)
}
# a sample simulation to check the algorithm
test_sample = gen_sample(200, prob_age = c(0.3, 0.3, 0.4), prob_sex = 0.5, prob_site = 0.8, prob_occ = 0.7)
temp = BAR(test_sample, prob_bc = 0.8) # 0.8/0.2 biased coin probability split for two-arm study
summary(tableby(trt ~ ., data = temp), text = TRUE)
# the program works pretty well on balancing equal allocation of treatment overall and within each factor
# parameters that we can tune in this program: prob_bc, the probability for levels within each
# stratification factor; or may be the measure of difference between two arms (so far, frequency
# difference works better than chisq statistic for the simulation observation)